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Abstract — The paper proposes Monte Carlo algorithms for 
the computation of the information rate of two-dimensional 
source / channel models. The focus of the paper is on binary-input 
channels with constraints on the allowed input configurations. 
The problem of numerically computing the information rate, and 
even the noiseless capacity, of such channels has so far remained 
largely unsolved. Both problems can be reduced to computing 
a Monte Carlo estimate of a partition function. The proposed 
algorithms use tree-based Gibbs sampling and multilayer (multi- 
temperature) importance sampling. The viability of the proposed 
algorithms is demonstrated by simulation results. 

Index Terms — Two-dimensional channels, constrained chan- 
nels, partition function, Gibbs sampling, importance sampling, 
factor graphs, sum-product message passing, capacity, informa- 
tion rate. 



I. Introduction 

Numerically computing the Shannon information rate for 
source /channel models with memory can be difficult. In many 
cases of practical interest, analytical results are not available 
or hard to evaluate numerically. For a large class of channels, 
however, Monte Carlo methods as proposed in ||T)-||3) have 
been shown to yield reliable numerical results. 

In this paper, we consider the extension of such Monte Carlo 
methods to source / channel models with a two-dimensional 
(2-D) structure. The focus of the paper is on 2-D binary-input 
channels with constraints on the allowed input configurations; 
for example, we consider the channel where adjacent channel 
input symbols must not both have the value 1. Variations of 
such channels are of interest in magnetic and optical storage, 
where the constraints are imposed, e.g., in order to reduce 
the intersymbol interference or to help in timing control Q- 
||8). We will consider both noiseless and noisy versions of 
such channels. With suitable modifications (simplifications), 
the methods of this paper can also be applied to other 2-D 
source / channel models such as channels with intersymbol 
interference. 

In the one-dimensional (1-D) case, computing the capacity 
of noiseless constrained channels was addressed and solved by 
Shannon [^J, see also |4 |. For the noisy case, the Monte Carlo 
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methods of |[T]-|[3| can be used to compute the information 
rate. The 2-D case is harder Even the noiseless capacity is hard 
to compute numerically: while very tight analytical results 
are available for a number of special cases (e.g., 1 10|-p3)), 
other cases have remained open problems. The noisy case has 
remained largely unsolved. 

The capacity of a noiseless constrained channel is essen- 
tially the logarithm of the partition function of the correspond- 
ing indicator function (see Section |ll]). Moreover, computing 
the information rate of noisy source / channel models can also 
be reduced to the computation of a partition function (see 



Section VI i. The heart of the paper, therefore, are Monte Carlo 
algorithms for the computation of partition functions. Several 



such algorithms are well known 1 19 |-pT), see also |22|, p3). 



but some modifications will be necessary for the problems of 
interest in this paper In particular, we will find tree-based 
Gibbs sampling (due to Hamze and de Freitas f2^) extremely 
useful. We will observe that Monte Carlo estimates of a 
partition function may actually be obtained as a by-product 
of tree-based Gibbs sampling, which does not seem to have 
been noticed before. 

In related prior work, Monte Carlo algorithms have been 
used to compute bounds on, or approximations of, the in- 
formation rate of 2-D source / channels with memory ||25|, 
p6| . Some of this work uses generalized belief propagation 
|27i|, which appears to yield very good approximations to the 
information rate |25), fTS), |j28), g9). 

In contrast to all this prior work, the Monte Carlo methods 
of this paper are asymptotically unbiased, i.e., in the limit of 
infinitely many samples, the estimates are guaranteed to con- 
verge to the desired quantity (the information rate). Moreover, 
the focus of this paper is on constrained channels, for which 
these computational problems are harder than for intersymbol 
interference channels (cf. Section [VI-B| i. 

The empirical success of the proposed algorithms is epito- 
mized by Fig. [8j which shows the uniform-input information 
rate of a binary-input channel with input constraints and 
additive white Gaussian noise (AWGN). As far as known to the 
authors, no such figure (for such a channel) has been presented 
before. 

If the reader is not familiar with Gibbs sampling, the 
following comments on the general nature of this work may 
be in order First, Gibbs sampling is easily proved (under 
very mild conditions) to yield samples that are eventually 
distributed according to the desired distribution and asymp- 
totically independent |23| (i.e., deleting the first t samples 



and decimating the remaining sample sequence by a factor m 
results in an i.i.d. sequence in the limit i, m -> cxi). However, 
the dependencies among the samples may decay extremely 
slowly, which is the pivotal issue with Gibbs sampling and 
makes naive Gibbs sampling perfectly useless for the problems 
of this paper (and for many other problems). The challenge, 
therefore, is to speed up Gibbs sampling (i.e., to decrease the 
dependencies of the samples) by various additional tricks and 
insights so that it becomes useful. 

Second, the class of problems for which the methods 
proposed in this paper will work is not easily expressed 
in exact terms. Again, the issue is not formal applicability 
(which is quite sweeping), but the speed of convergence, which 
strongly depends on the particulars of the case and is not easily 
predicted. 

The paper is organized as follows. In Section |ll] we review 
partition functions and noiseless 2-D constrained channels, and 
we introduce the corresponding notation. In Section III we 



review several Monte Carlo algorithms that will be used in 
this paper However, additional ideas are necessary to make 
these algorithms work for our applications. In particular, we 
will need tree-based Gibbs sampling as described in Sec- 
tion IV The application to noiseless constrained channels is 



demonstrated in Section [V] The application to noisy channels 
is described and demonstrated in Section VI The appendix 



reviews sampling from Markov chains, which is needed in 
Section UV] 

II. Partition Function of 2-D Graphical Models 

Let Xi, X2, ■ ■ ■ , '^N be finite sets, let X be the Cartesian 
product X = Xi X X2 X . . . X Xn, and let / be a nonnegative 
function / : A" — > M. We are interested in computing (exactly 
or approximately) the partition function 



E 



/(x) 



(1) 



for cases where 

• is large and 

• / has a suitable factorization (as will be detailed below). 
We will usually assume Zf 7^ 0. Then 

/(x) 



P/(x) 



Z 



f 



(2) 



is a probability mass function on X. We also define the support 
of / (and of pf) as 



Sf^l^eX: /(x) > 0}. 



(3) 



If /(x) has a cycle-free factor graph representation (and 
if jAfil, IA2I, . . . , lAVl are not too large), then Zf can be 
computed efficiently by sum-product message passing pO) , 
In this paper, however, we consider factor graphs with 
cycles. In particular, we are interested in examples of the 
following type. 

Example: Simple 2-D Constrained Channel 

Consider a grid of M x M binary (i.e., {0, 1}- 

valued) variables xi, . . . ,xn with the constraint that no two 
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Fig. L Forney factor graph of the indicator function The unlabeled 
boxes represent factors as in js}. 



(horizontally or vertically) adjacent variables have both the 
value 1. Let / : {0, 1}^ — > {0, 1} be the indicator function of 
this constraint, which can be factored into 



f{xi,...,XN) = Yl K{xk,Xi), 
k, £ adjacent 



(4) 



where the product runs over all adjacent pairs {k,£) and with 
factors 

0, if Xk ^ xe = 1 

1, otherwise. 



K(xk,Xi) 



(5) 



The corresponding Forney factor graph of / is shown in Fig.[T| 
where the boxes labeled are equality constraints |31J. 
(Note that, in Forney factor graphs, variables are represented 
by edges. Fig. [T] may also be viewed as a factor graph as in 
"301 where the boxes labeled "=" are the variable nodes.) 



Note that, in this example, Zf — \Sf\. 
This example is known as the 2-D (1, 00) run-length Umited 
constrained channel ffl. The quantity 



Cm = ^ log2 Zf 



(6) 



is known as the (finite-size) noiseless capacity of the channel. 

For this particular example, upper and lower bounds on the 
infinite-size noiseless capacity 



(7) 



Coo = lim Cm 

Af-s-oo 



were first proposed in pOJ and improved in and p2) . 
The bounds in p2) agree on nine decimal digits, which far 
exceeds the accuracy that can be achieved with the Monte 
Carlo methods of the present paper However, the methods 
proposed in this paper work also for various generalizations 
of this example for which no tight bounds are known. □ 

Later on, in Section |VI] we will consider noisy versions of 
such channels. As it turns out, the computation of the infor- 
mation rates of such channels also requires the computation 
of partition functions as in ([T]l. 



III. Monte Carlo Methods for Partition Function 
Estimation 

One well-known method to estimate Tf = 1/Zf (and thus 
Zf itself) goes as follows. 



Ogata-Tanemura Method ||T9), ||2TJ: 

1) Draw samples x'^^\ x'^^^ . . . , x*^^^ from Sf according 
to Pf{x) as in 

2) Compute 



K 



|5/|^^/(xW) 



(8) 

□ 



It is easy to verify that E(r/) ^ l/Zj. 

However, there are two major issues with this method. 
First, there is the problem of generating the samples 
x(i), xf^^^ . . . , x^^^) according to p/(x). Ideally, we would 
like these samples to be independent, but (for the purposes of 
this paper) this is asking too much. In particular, we will use 
Gibbs sampling 122), 1 33 1, which produces dependent samples. 



However, with naive Gibbs sampling, the dependencies among 
the samples decay far too slowly for the estimate (jS} to be 
useful for us (cf the remarks in the Introduction). We will 



see in Section IV how this issue is eased by tree-based Gibbs 
sampling as proposed by Hamze and de Freitas f24\. 

Second, it is usually assumed that / is strictly positive. In 



this case, Sf — X and \S 



f\ 



\X\ is known. However, this 



assumption excludes applications to constrained channels as in 
the example in Section [ll] Indeed, in that example, we would 
have /(x^*^)) = 1 for all samples x'^'"'\ and \Sf\ = Zf is 
the desired unknown quantity. We will address this issue in 
Section HV^ 

With suitable modifications, which will address the men- 
tioned issues, the Ogata-Tanemura method will turn out to 
work well for the capacity of noiseless constrained 2-D 
channels. 

However, for our second application — the information rate 
of noisy 2-D constrained source / channel models — the Ogata- 
Tanemura method turns out to be inadequate. We will therefore 
resort to multilayer importance sampling as described below. 
We first describe the use of standard (single-layer) importance 
sampling to estimate Zf. 



Importance Sampling p2|, p4): 

1) Draw samples x^^^ x'^^\ . . . , x*^^^ from X according to 
some auxiliary probability distribution q(x) = ^g(x), 
where ^(x) is a nonnegative function defined on X 
satisfying g(x) ^ whenever /(x) ^ 0. 

2) Compute 



R = 



1 



K 



/(xW) 



(9) 

□ 



It is easy to verify that E(i?) Zf/Zg. 

The key issue with importance sampling is to find a useful 
function f/(x) such that 

• q{x) is a good approximation of p(x) (so that f{x)/g(x) 
does not wildly fluctuate), 

• sampling from g(x) is feasible, and 
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Fig. 2. Partition of Fig. [T| into two cycle-free parts (one part inside the two 
ovals, the other part outside the ovals). 



• computing Zg is feasible. 

An obvious choice for ^(x) (and thus g(x)) is 



?(x) - /(x)" 



(10) 



for < a < 1. With this choice, any factorization of /(x) 
results in a factorization of g{x) that preserves the topology 
of the corresponding factor graph. (Note, however, that this 
choice of g(x) is not helpful if /(x) is {0, l}-valued.) 

In order to sample from q{x.), we will again use tree-based 



Gibbs sampling (see Section IV-Ai 



In a variation of the algorithm, the estimator (|9]) of the 
ratio Zf/Zg could be replaced by Bennett's acceptance ratio 
method p5] , which is also known as bridge sampUng p6) . 

A function g(x) with all the required properties may be 
hard to find, or it may not exist. This problem is addressed by 
multilayer importance sampling, which uses several auxiliary 
distributions. 

Multilayer (Multi-Temperature) Importance Sampling: 

Multilayer importance sampling, as described here, is a 
variation of annealed importance sampling as proposed by 
Neal p7) , | |34j ; see also |38|. We use J parallel versions of 
importance sampling as follows. For j = 0, 1 . . . , J, let 



ff,(x)^/(xr^ 



(11) 



with < aj < ■ ■ ■ < ai < ao 



1. Note that Zg^ ^ Zf and 



Zf 



^91 ^92 



(12) 



-'gj 



For i = 1, 2, . . . , J, compute the ratio Zg._^/Zg. by impor- 
tance sampling as described before: 

1) Draw samples x*^^^ , x*^^^ , ■ • • , x*^^) from qj (x) cx gj (x). 



4 



2) Compute 



We then note that 



ill — — 
^ K 



1 ^ 



5,(x(fc)) 



k=l 
K 



K 



k=l 



(13) 

(14) 

□ 



z 



(19) 
(20) 



i.e., /a (and analogously /s) has the same partition function 
as / itself. 



Noting that E(i?j) = 
estimate of Zj/Zgj. 

If the number of layers J is large, gj (x) is a good approx- 
imation of gj_i{x.) at each layer j. 

It remains to find an estimate of Zgj, which tends to be 
easier than the original problem of estimating Zf.ln particular, 
for aj — 0, we have Zg^ — \Sf\, the cardinality of the support 



We also note that samples x 



(1) ^(2) 



from 



P/a(xa)= ^ = 2^p/(xA,Xa) (21) 



/ 



can be obtained by tree-based Gibbs sampling as in Sec- 
tion IV-A simply by dropping the second component (the im- 



part) of the samples (x^\ x^^-*), (x^'',x^''). 



.(2) ^(2)^ 



We can now use the Ogata-Tanemura method (Section IIIi 



of f. In our application (Section VI i, it turns out that Zg, ^ -.• i • » ^ ee » 

Y J «; ■ 1 u u u J /-V estimate 1 / = V^/ m two different ways. One way is to 

directly use the estimator Jsl with samples x*^*"'^ = (x^-*, x^-*) 



can be computed efficiently by the tree-based Ogata-Tanemura 
method of Section IIV-BI 

IV. Tree-Based Gibbs Sampling and Partition 
Function Estimation 

A. Tree-Based Gibbs Sampling 

Tree-based Gibbs sampling was proposed by Hamze and de 
Freitas [24|. It works as follows. Let (^,-8) be a partition of 
the index set {1,2,..., N} such that, 

• for fixed x^, the factor graph of /(x) /(xajXe) is 
cycle-free, and 

• for fixed x^, the factor graph of /(x) = /(x^,Xb) is 
also cycle-free. 

An example of such a partition is shown in Fig. |2] Starting 
from some initial configuration x^''^ = -^(^h 



pies x^ 



first, x^'' is drawn from 

P/(xa|x_b = X 



x^ ), the sam- 



Xg ), fc — 1, 2, . 



are created as follows: 



(A,-l). 



second, x^' is drawn from 



Pfi^sl^A = ^A^) OC /(x^'■■^XB) 



J-/ (fe-l)x 



(fc) 



(15) 



(16) 



The point is that drawing these samples is easy (by means of 
backward-filtering forward-sampling, see the appendix) since 
the corresponding factor graphs are cycle-free. 

As in standard Gibbs sampling, the samples (x^-'jX^-') are 
eventually (i.e., for k — > oo) drawn from (provided that the 
corresponding Markov chain is irreducible and aperiodic |[23|). 
However, tree-based Gibbs sampling mixes faster than naive 
Gibbs sampling. 



B. Application to Partition Function Estimation 
With A and B as above, let 



and 



/a(xa) = ^/(xa,xb), 



(17) 



(18) 



as in Section IV-A Another way is to apply the estimator ([8]) 
to /a, i.e., to compute 



-Ia 



K\S 



K 

E 



which is also an estimate of l/Zf. The computation of 

A(x^'))=5]/(x(f\xB), 



(22) 



(23) 



which is required in (22i, is easy since the corresponding 
factor graph is a tree; in fact, the result of this computation is 
obtained for free as a by-product of tree-based Gibbs sampling 
as is explained in the appendix. By symmetry, we also have 
an analogous estimate T defined as 



1 



1 



KlSf 



K 

k=l JB[^B ) 



(24) 



With the same samples (x^^x^-*), (x^^x^-*). 



,(2) 



timating l/Zf from (22i and (24i tends to converge faster 
than ([8]l. More importantly for this paper, the direct Ogata- 
Tanemura method ([HJ cannot be used for constrained channels 
(cf the example in Section |llji where |iS/| = Zj is the desired 



quantity. In contrast, the computation of in (22i and 

\Sfg \ in (24 1 may be easy in such cases as will be exemplified 
below. 



V. 



Application to the Capacity of Noiseless 2-D 
Constrained Channels 



We demonstrate the estimators ( 22 1 and ( [24] i by their appli- 
cation to the example in Section |II[ the 2-D (1, oo) runlength- 
limited constrained channel. 

We will use factor graphs as in Fig. [T] with a partitioning 
as in Fig. |2] In this example, the quantities \Sf^ \ and |iS/b|, 
which are needed in ( [22] i and ( [24| , respectively, are given by 

(25) 



\SfJ = 5]/(xA,0) 

XA 



(26) 
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1 10 100 1000 10000 100000 lc+06 lc+07 

Number of Samples 

Fig. 3. Estimated noiseless capacity (in bits/symbol) vs. the number of 
samples K for a 10 X 10 grid with a (1, oo) constraint. The plot shows 10 
different sample paths, each with two estimates, one from Fa and another 
from r^. The horizontal dotted line shows the infinite-size capacity for this 
constraint. 



1 10 100 1000 10000 100000 le+06 le+07 le+08 

Number of Samples 

Fig. 4. Estimated noiseless capacity (in bits/symbol) vs. the number of 
samples K for a 60 X 60 grid with a (1, cjo) constraint. The plot shows 10 
different sample paths, each with two estimates, one from Ta and another 
from r^. The horizontal dotted line shows the infinite-size capacity for this 
constraint. 



Since 



1, if/A(xA)>0 
0, if/^(x^)=0, 



(27) 



/(XA,0)= 

and likewise for /(0,Xb). It follows that and \Sfg 
easy to compute by sum-product message passing in the cycle- 
free factor graphs of /(x^,0) and /(0,xb), respectively. 

Some experimental results are shown in Figs. |3] through [6] 
All figures refer to / as in (j4]) and (|5]l and show the estimates 
of the finite-size capacity Cm <|6| obtained from ( p2| and (j24]| 
vs. K for several different experiments. 

In Fig.|3] we have = lOx 10 and we obtain Cio « 0.6082 
bits/symbol. In Fig. [4] we have iV = 60 x 60, and there are 
issues with slow convergence. 

In order to speed up the mixing and thus improving the con- 
vergence, we now partition the factor graph into vertical strips 
each containing M x 2 or A/ x 3 variables (rather than M x 1 
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Fig. 5. Same conditions as in Fig.|4] but with strips of width I 



0.592 



1000 10000 
Number of Samples 



Fig. 6. Same conditions as in Fig. [4] but with strips of width three. 



variables as in Fig. |2]i. Exact sum-product message passing is 
still possible on such strips, e.g., by converting the strip into 
an equivalent cycle-free factor graph. The computation time 
is exponential in the width of the strip, but the faster mixing 
results in a substantial reduction of total computation time for 
strips of moderate width. 

Simulation results for strips of width 2 and 3 are shown in 
Figs.[5]and[6] respectively, both for a grid of size N — 60x 60. 
Note that the convergence is much better than in Fig. |4j and 
we obtain Cgo ~ 0.5914 bits/symbol. 

Also shown in these figures, as a horizontal dotted line, 
is the infinite-size capacity Coo ~ 0.5879 bits/symbol from 
1*3211, which (in this example) is a lower bound on the finite- 
size capacity. 

VI. Estimating the Information Rate of Noisy 2-D 
Source /Channel Models 

A. The Problem 

We now consider the problem of computing the information 
rate of noisy 2-D source / channel models. Although the focus 
of this paper is on constrained channels, the proposed approach 



Fig. 7. Extension of Fig. ^ to a factor graph of p(x)p(y|x) with p(y|x) 
as in ]29\ . 



Fig. 8. Uniform-input information rate (in bits/symbol) vs. SNR for a 24 X 24 
channel with a (l,oo) constraint and additive white Gaussian noise. The 
horizontal dotted line shows the noiseless capacity of this channel. 



can also be applied to other 2-D source / channel models such 
as 2-D channels with intersymbol interference. 

For a 2-D grid of size N — AI x M (as before), let 
X = {Xi, X2, ■ ■ ■ , Xn} be the source process (i.e., the input 
process of the channel) and let Y = {Yi,Y2, . . . , Yjy} be the 
output process of the channel; X takes values in X (as defined 
in Section [ill) ^ takes values in M^. We wish to compute 
the mutual information rate 

^ J(X; Y) - 1 (i? (Y) - (Y| X)) (28) 

for cases where p(x, y) has a suitable factor graph. In particu- 
lar, we will focus on the case where the channel is memoryless. 



N 



p(y| 



n=l 



and where the channel input distribution p(x) has a factor- 
ization with a factor graph as in FigJTl It then follows that 
p(x, y) has a factor graph as in Fig. l7| 

In many cases of practical interest, iJ(Y|X) is analytically 
available, see our numerical experiments in Section [VI-C| In 
such cases, the problem of computing the mutual information 



rate ( 28 1 reduces to computing 

i/(Y) = E[-log2p(Y)]. (30) 

If iJ(Y| X) is not analytically available, it can be computed 
by the same method as H{Y), see |2j Section III]. 

B. The Method 



As in |2j, we approximate the expectation in (30i by the 
empirical average 

1 ^ 

i/(Y)«--^log2(p(yW)), (31) 

£=1 

where samples y'^', y^^', . . . , y^^' are drawn according to 
p(y). The problem of estimating the mutual information rate 
is thus reduced to 



• creating samples y*^^) and 

• computing p(y'^^-') for each sample. 

If p(x, y) has a cycle-free factor graph (and if 
jAfil, |A'2|, . . . , |Ajv| are not too large), then both tasks can 
be carried out in a single-loop algorithm as in |2|. In this 
paper, however, we assume that no such factor graph exists 
and we propose a double-loop algorithm (with an outer loop 
and an inner loop) to carry out these tasks. In the outer loop, 
we create samples y^^-*, . . . ,y*-'^-' as follows. 

1) Draw samples x^^^ . . . , x^^^ according to p(x). In sim- 
ple cases (such as unconstrained channels with i.i.d. 
input), this may be trivial; in general, however, we do 



(29) 2) 



this by tree-based Gibbs sampling (as in Section IV-Ai 
using the factor graph of p(x). 

For £ — 1, . . . ,L, draw y^^^ from PY\:x_iy\x^^^), i.e., by 
simulating the channel with input x*^^^. 
In the inner loop, we compute an estimate of 



x) 



(32) 



the right-hand side of (32 1 



as follows. Note that, for fixed 
is the partition function Zf^ of 

/,(x)^p(x)pY|x(y<'V), (33) 

which has a suitable factor graph (as, e.g., in Fig. [7]i. In 
principle, we can thus estimate ([32]) by any of the methods 
of Section 



III 



It turns out, however, that only multilayer 
importance sampling is able to handle the more demanding 
cases (as will be explained in our numerical experiments in 
Section |VI-C[ ) while the other methods of Section [lll| suffer 
from slow and erratic convergence. 

C. Numerical Experiments 

In our numerical experiments, we consider a noisy version 
of the example in Section [ll| i.e., a noisy version of the 
2-D (1, 00) runlength-limited constrained channel. We assume 
that the channel input distribution p(x) is uniform over the 
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1 10 100 500 

Number of Samples 

Fig. 9. Estimated information rate (in bits/symbol) vs. the number of samples 
L for a noisy 24 X 24 (1, oo) constraint at dB. The plot shows 12 different 
sample paths. 



0.5S 




Number of Samples 

Fig. 10. Estimated information rate (in bits/symbol) vs. the number of 
samples L for a noisy 24 X 24 (1, oo) constraint at 6 dB. The plot shows 12 
different sample paths. 



allowed configurations, i.e., p{x) = P/(x) with / as in Q, 
and we assume that the noise is additive white Gaussian (and 
independent of X), i.e., p(y|x) is a product as in ( [29] l with 
factors 



p(2/«|a;„) 



and thus 



: exp 



V2~7 



i?(Y|X) 



20-2 ( 



Vn 



N 



log2(27recr^ 



We will use the signal-to-noise ratio (SNR) defined as 



SNR 



1 



(34) 



(35) 



(36) 



which we will specify in dB, i.e., 10 logj^oC'^NR). 

The grid size in all the plots is iV = 24 x 24 and the 
parameters ofj in ( 11 1 are set to aj = 2^^, for j — 1,2, . . . , J. 



Fig. [8] shows the computed information rate vs. the SNR 
over the interval [—10,8] dB. The horizontal dotted line 



in Fig. [8] shows the capacity of the noiseless version of this 
channel, which is about 0.596 bits per symbol. 

Figs. |9]and 10 illustrate the convergence of the outer loop 
of the proposed double-loop algorithm at dB and at 6 dB, 
respectively. Both figures show the estimated information rate 
vs. the number of samples L in (31 1 for 12 different sample 
paths. 

As for the inner loop, the required number of layers J 
in ( [T2I ) depends on the SNR. As the SNR increases (or 



equivalently as decreases), the function /^(x) in (33 1 tends 
to have more isolated modes. Therefore, in order to obtain 
a good approximation at each level of multilayer importance 
sampling, larger values of J are required for higher SNR. In 
our numerical experiments at dB and 6 dB, J is set to 3 and 
6, respectively. 

Fig. [TT| shows the convergence of logj Rj as in ( |l4| for 
j — 1, 2, . . . , 6, for a fixed output sample at 6 dB. 

The value of Zgj is estimated by the tree-based Ogata- 



Tanemura method of Section IV-B| Fig. [12] shows the con 
vergence of the estimate of \og2{Zgg)/N according to (22i 
for four different sample paths. 

D. Remarks 

In statistical physics, the partition function typically has the 
form 

Z=Y, e-^^^y^, (37) 

where T is the temperature and £'(x) is the energy of the 
configuration x. We therefore point out that the noise variance 



a in ( 34 1 may be viewed as the temperature parameter of the 



partition function Zf^ of (33 1. It is well known in statistical 
physics that computing the partition function is harder at low 
temperatures than at high temperatures. Similarly, we observe 



that computing the partition function Z/, of (33 1 is harder at 
high SNR than at low SNR; in particular, at high SNR, more 
layers (higher values of J) are required for multilayer (multi- 
temperature) importance sampling. 

We also note that, in the examples of Section |VI-C| the 
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is somewhat 



choice of the parameters Uj = 2~^ m 
arbitrary. It is possible that other choices of these parameters 
result in faster convergence. 

VII. Conclusion 

Monte Carlo methods have been highly succesful in comput- 
ing the information rate of source / channel models with 1 -D 
memory. The extension of such methods to source / channel 
models with 2-D memory has been an open research problem. 
In this paper, we develop such methods with a focus on the 
(difficult) case of channels with input constraints, with or 
without noise. In contrast to previous techniques, which either 
use generalized belief propagation or compute only bounds on 
the information rate, the Monte Carlo algorithms of this paper 
are guaranteed to converge (asymptotically) to the desired 
information rate. A key role in the proposed algorithms is 
played by tree-based Gibbs sampling by Hamze and de Freitas, 
which we have shown to yield an estimate of the partition 
function as a by-product. The success of the proposed methods 
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Fig. 11. Computed log2 Rj as in jl4| , for j = 1, 2, . . . , 6 vs. the number 
of samples K for a noisy 24 X 24 (i, oo) runlength-limited constraint at 6 
dB. The plot shows log2 Rq, log2 Rs, . . . , log2 Ri from top to bottom. 



0.565 

0.56 
0.555 

0.55 
0.545 

0.54 
0.535 

0.53 
0.525 

0.52 



100 1000 
Number of Samples 



Fig. 12. Estimated log2(Zgg)/Af vs. the number of samples K for a noisy 
24 X 24 (1, oo) runlength-limited constraint at 6 dB. 



is exemplified by Fig. |8] which (to the best of our knowledge) 
is the first such plot for a noisy 2-D channel. We also note that 
the extension of the proposed methods to computing upper and 
lower bounds on the information rate as in [2, Section VI] is 
straightforward. 
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Appendix 
Sampling from Markov Chains 

We recall some pertinent facts about the simulation of 
Markov chains and cycle-free factor graphs. Let p(x) ~ 
p{xi, . . . , Xn) be the probability mass function of a Markov 



Xk-2 I 1 Xk-1 



9k 



9k+i 



Fig. 13. Forney factor graph of j39| with messages Px^. j40| . 

chain. If p(x) is given in the form 

n 

Pi^) ^ P{xi)\\_p{xk\xk~i), (38) 

k=2 

then it is obvious how to draw i.i.d. samples according to p(x). 
Now consider the case where p(x) is not given in the form 
p8|), but in the more general form 



p(x) cx Jl 9k{xk-i,Xk) 



(39) 



fc=2 



with general factors gk- It is then still easy to draw i.i.d. 
samples according to p(x), which may be seen as follows. 



First, a probability mass function of the form (39 1 can be 
rewritten in the form ( |38| l (which allows efficient simulation). 
Second, this reparameterization of p(x) may be efficiently 
carried out by backward sum-product message passing, as 
will be detailed below. The resulting algorithm is known as 
"backward-filtering forward-sampling" (or, in a time-reversed 
version, as "forward-filtering backward-sampling") p9) . 
Specifically, let be the backward sum-product message 



along the edge Xk in the factor graph of ( 39 1, as is illustrated 



in Fig. 13 (cf. 1311). We then have "^x„(a^n) — 1 and 



t^xjxfe) = ^ gk+i{xk,Xk+i)')Ix^+i{xk+i) (40) 



— ^ ^ 9'fniXm — lT Xm) 

Xk-l-i,...,x„ m=k + l 

for fc = n — 1, n — 2 1. Then 



, . . . , 0.7^ y 



and 



p{xk\Xk-l) 



9k{Xk-l,Xk) ^J■Xki^k) 

Wfc-i(a;fe-i) 



(41) 

(42) 
(43) 

(44) 



for k = 2, . . . ,n. The proof of ( [44] i follows from noting that 

p{xk-i) = 77txfc_^(a;fe-i)Vxfc_i(a;/c-i) (45) 

and 

p{xk-i,Xk) ^ ■^~jixk-i{xk-i)9k{xk-i,Xk)^Xk{xk) (46) 
where 'jix^_i is the forward sum-product message along the 



edge Xk-i and where 7 is the missing scale factor in (39 1. 
We also note that 



XI Vxi(a;i) = ^^(x), 



(47) 



where ^(x) is defined as the right-hand side of ( [39| ). In this 
paper, this fact is used to compute the marginals (|23[) as a 
by-product of the sampling. 
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The generalization of all this to arbitrary factor graphs 
without cycles is straightforward. 
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